IEEE International Conference on Computer Communications
20–23 May 2024 // Vancouver, Canada

Sponsor's Event

Sponsor’s Workshop

Monday, May 20, 2024 ● 14:00 – 16:00 ● Room: Georgia B

 


14:00–15:30

Invited Talks/Keynotes

 

15:30–16:00 

Coffee Break  

 

16:00–17:30 

Panel: Network Optimization for Large-Scale AI Clusters

 

With the rapid growth in sizes of modern AI models, large-scale distributed clusters, with orders of 1K, 10K, or even 100K of cards, have been widely deployed to meet memory and computation requirements. Communication time increases as cluster becomes larger, and network can become a bottleneck, resulting in sub-linear scaling in distributed training. Designing high-performance network systems to optimize communication in AI clusters is very critical and challenging. Such optimization includes but is not limited to efficient network topologies, routing algorithms, traffic engineering, communication protocols, collective scheduling, and fast fault discovery/recovery. This panel will discuss the insights, challenges, and opportunities in network optimization of large-scale clusters for distributed AI training and inference.

 

17:30–18:00

Networking Break

Gold Patrons

Student Travel Grant Sponsors